-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete finished job in the PropagateDirectory scheduleNextJob. #5269 #5400
Conversation
int subJobsCount = _subJobs.count(); | ||
while (i < subJobsCount && _subJobs.at(i)->_state == Finished) { | ||
_firstUnfinishedSubJob = ++i; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix this sophisticated caching. Is there any reason to keep Finished Jobs?
The only place client could use the "job" is in https://github.com/owncloud/client/blob/master/src/libsync/propagatedownload.cpp#L425 and https://github.com/owncloud/client/blob/master/src/libsync/owncloudpropagator.cpp#L557 However, it checks only for Running ones, so there is no problem. |
@@ -188,18 +188,18 @@ class OWNCLOUDSYNC_EXPORT PropagateDirectory : public PropagatorJob { | |||
QScopedPointer<PropagateItemJob>_firstJob; | |||
|
|||
// all the sub files or sub directories. | |||
QVector<PropagatorJob *> _subJobs; | |||
QList<PropagatorJob *> _subJobs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is good, https://marcmutz.wordpress.com/effective-qt/containers/
@ogoffart might have to say more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, didnt know that QList is Qt implementation (of array sth?). QLinkedList then? I just need to be able to delete from early begining of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete from the front with QVector and we might want to go for memory over speed here. But I don't have confidence of the best choice here. QVector should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks good to me.
|
||
while (subJobsIterator.hasNext()) { | ||
subJobsIterator.next(); | ||
if (subJobsIterator.peekPrevious()->_state != Finished && subJobsIterator.peekPrevious()->parallelism() != FullParallelism) { | ||
return WaitForFinished; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Would prefer const auto job = subJobsIterator.next();
and then use job
over subJobsIterator.peekPrevious()
. Usage of peekPrevious()
makes me think that something special is happening. Same in the other loop below.
I'd even be fine calling subJobsIterator
it
.
@@ -188,18 +188,18 @@ class OWNCLOUDSYNC_EXPORT PropagateDirectory : public PropagatorJob { | |||
QScopedPointer<PropagateItemJob>_firstJob; | |||
|
|||
// all the sub files or sub directories. | |||
QVector<PropagatorJob *> _subJobs; | |||
QList<PropagatorJob *> _subJobs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete from the front with QVector and we might want to go for memory over speed here. But I don't have confidence of the best choice here. QVector should be fine.
@ogoffart Do you see anything from logic perspective ? Should we check it in all possible scenarious to find out? |
@mrow4a I looked at it from a
I didn't see any other issues. |
@ckamm I think the simplest way could be to track _jobsFinished, it is very self explaining and easy to understand for other developer. I did not really think of any other solution because that one was seeming quite nice. I agree on append to increase the counter, true. BUt I dont think it is safe to do it while iterating, maybe we should comment it instead or assert that if running append is not possible? BTW: If it will occur that we can merge this PR, I think we can safely say that 2.3 decreases both memory and CPU usage, we get rid of unnecessary loops and in-memory objects. |
@guruz If we would use QVector with iterators, complexity is O(n). If so, I would rather stay with Caching as it is now and dont use this PR. |
@mrow4a Why do you think it is different for QVector vs QList? |
@guruz I think we would need to use the class which allows delete from any point in the list e.g. QLinkedList instead of QList(Qt custom implementation?) or QVector |
@mrow4a QList behaves exactly as QVector when sizeof(T) <= sizeof(void*). In both cases deleting from the front will only involve a memmove since pointers are POD types without destructors. A QLinkedList might scale better, but we need to be conservative with memory during sync, so as @ckamm mentioned, not sure either if it's worth it. If you can't see QVector operations in a profiler while syncing, it's definitely not worth it. |
@jturcotte I think we would be very conservative with memory if deleting every finished items compared to current Vector implementation which stores everything in memory. |
@jturcotte @mrow4a One could set the "deleted" pointer in the qlist/vector to 0 instead of using QLinkedList (EDIT: Although I also agree with @jturcotte that if it does not show up in profiling, just leave it as qvector as it was before.. even if QList is equivalent for pointers, minimize the code changes) |
@guruz yes this is another approach, however you would need to keep current cache implementation not to perform unnecesary loops Hmm, I actualy not sure about that, I think memory still will be allocated to fit the previous object isnt it? EDIT: sorry, we store pointers, forget, but the previous comment is valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes deleting finished jobs sounds like a good idea, but QLinkedList will also be longer to iterate (more memory deferences) and cause a lot of small allocations instead of one allocation for the vector (more fragmentations). Loops don't directly matter, it's CPU cycles that matter and I think that setting the pointer to NULL could work pretty well.
// peekPrevious() will directly access the item through hash in the QList at that subjob | ||
if (subJobsIterator.peekPrevious()->_state == Finished) { | ||
// if this items is finish, remove it from the _subJobs list as it is not needed anymore | ||
subJobsIterator.remove(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this leak the PropagatorJob and all its children? I'm surprised that you could just delete jobs without crashes, it's possible that you'll start facing issues if you actually delete the job. Please test thoroughly with very large syncs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand, we will now just set pointer to 0? As I understand, I should also call delete
on class pointer to delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds about right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove() does not delete, so setting to 0 does also not delete. You need to find out where it is deleted normally and check if you're not breaking anything by deleting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, delete it. Currently I think it is a leak, also mentioned it in #5400 (comment)
@@ -604,6 +604,11 @@ bool PropagateDirectory::scheduleNextJob() | |||
if (_state == NotYetStarted) { | |||
_state = Running; | |||
|
|||
// at the begining of the Directory Job, update expected number of Jobs to be synced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please start your comments with a capital letter, and end them with a period.
1b375b1
to
0b59917
Compare
Ok, corrected the required things, also did a test with 5000 files in nested directories, deleting, moving, renaming, downloading, uploading etc. Works correctly. I also did a test syncing 1000 files in 10 directories on my local owncloud: Of course, increasing bookkeeping time higher the timing will not be that visible, but generaly it works faster, and because we delete jobs it should be also much more memory friendly. |
0b59917
to
4a70a23
Compare
// Note that in this case remove() from QVector will just perform memmove of pointer array items. | ||
PropagatorJob * jobPointer = subJobsIterator.value(); | ||
subJobsIterator.remove(); | ||
delete jobPointer; | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a moment in which I am removing the job from the queue and deleting the referenced object since it is not used anymore nowhere.
4a70a23
to
9179ba8
Compare
@mrow4a I'm busy today, but will review tomorrow! |
As per IRC, we want something like this in 2.3. |
9179ba8
to
55dab17
Compare
_firstJob.reset(); | ||
} else { | ||
bool removed = _subJobs.removeOne(subJob); | ||
// FIXME: Try debug build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (status == SyncFileItem::FatalError || | ||
(sender() == _firstJob.data() && status != SyncFileItem::Success && status != SyncFileItem::Restoration)) { | ||
(subJob == _firstJob.data() && status != SyncFileItem::Success && status != SyncFileItem::Restoration)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just cleared _firstJob earlier, so this would always be false!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, added a test too.
55dab17
to
434fcbb
Compare
434fcbb
to
7b892c5
Compare
Can I benchmark it before we merge? Tell me when ready. Can do it tomorrow morning. |
@mrow4a Unless I messed up the scheduling there is no way that you will notice a difference if you involve a PHP server that takes >100ms to respond for each file. I'll merge it tomorrow, but if you think you caught some issue let me know and we can revert it (could be non-perf related too). |
I need several minutes, to rerun the test several times for many files. I wont check it with HTTP2 though, should I? |
@mrow4a HTTP2 is a layer under, it won't have any effect on the scheduling beside the fact that the server might take maybe 10% less time to respond and trigger the next job. I simulated it with the unit tests that don't even involve a server and I couldn't see a difference (fake HTTP response posted to the event loop), so I doubt that it would make any difference whether you involve HTTP or HTTP2 regarding this patch. |
In HTTP2, I am talking about 100 files in parallel. In one connection |
@jturcotte Hmm, looking at your modified changes, I think you are right and it wont make a difference, since you do it in slotSubJobFinished(). In my version I did it in scheduleNextJob(), maybe this is why I saw a difference. I think you are right, that having it in slotSubJobFinished, backed that you did a local test, should not have any influence here. Will do the tests in case, but dont expect anything wild. 👍 |
@mrow4a OK, thanks for testing! |
@jturcotte @ogoffart For HTTP2, comparing branches 2.3 and 2.3-pre1, having sync of 1000 files of 1kB, total 1MB, 100 files in parallel to CERNBOX EOS, on WiFi and 52 ms latency, repeating 5 times, in average your PR drops upload from 19s to 20-21s. |
@mrow4a Please allow me to be skeptical, but I don't see how this patch could lead to those results, not for a vector of less than 1000 elements. Could you paste the 5 run times of each configuration? What's the standard deviation between runs? You can't profile CPU-bound code with network times, it's like trying to catch a flea with a fishing net. |
https://cloudstor.aarnet.edu.au/plus/index.php/s/jWBwy6gm2ZtqSYi But yes, it was done first one set, then the other, so it could be also variation of the system itself (which EOS has a lot during the day). I should do it interleaving runs and at night though to be sure. I also saw a run in 2.3 which took 17+s so it could be that, 100 files in parallel is probably hard to get reproducable results in production system. |
I placed also results for HTTP1 in shared file, and I indeed can see that especially for http1 with 6 connections the veriations can be quite big, so I dont even try to deduce sth from there. However they oscillate around the one line. Bet that this for HTTP1 does not make any difference, maybe for HTTP2 with a lot of files in parallel though |
Anyways, I think the benefit of this is bigger than the use case of not having this PR. I just remember that during test we had a little bigger difference in the previous version of this PR in HTTP2 so it alarmed me. Nevertheless, 👍 |
Delete finished job in the PropagateDerectory scheduleNextJob.
This should reduce code complexity from #5274 and solve the issue #5269